Client Report - Show me!

Unit 4 Task 3

Author

[STUDENT NAME]

Show the code
import pandas as pd 
import numpy as np
from lets_plot import *

from types import GeneratorType
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics

LetsPlot.setup_html(isolated_frame=True)
Show the code
# import your data here using pandas and the URL
dwellings = pd.read_csv('https://github.com/byuidatascience/data4dwellings/raw/master/data-raw/dwellings_denver/dwellings_denver.csv')

dwellings_ml = pd.read_csv('https://github.com/byuidatascience/data4dwellings/raw/master/data-raw/dwellings_ml/dwellings_ml.csv')

# code form the last HW 
x = dwellings_ml.filter(["quality", "condition", "livearea", "stories", "arcstyle", "basement", "condition_Fair", "nocars", "numbdrm", "netprice", "numbaths", "sprice", "qualified_Q", "deduct", "finbsmnt", "abstrprd",]) 

y = dwellings_ml['before1980'] 

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = .2, random_state = 34)

classifier_DT = GradientBoostingClassifier(max_depth = 10)

classifier_DT.fit(x_train, y_train)

y_predicted_DT = classifier_DT.predict(x_test)

print("Accuracy:", metrics.accuracy_score(y_test, y_predicted_DT))
Accuracy: 0.9159938904647611

QUESTION 1

Create 2-3 charts that evaluate the relationships between each of the top 2 or 3 most important variables (as found in Unit 4 Task 2) and the year the home was built. Describe what you learn from the charts about how that variable is related to year built.

type your write-up and analysis here

Show the code
# print("Stories, and livearea  and abstrprd are the top 3 ")

plot1 = ggplot(dwellings_ml, aes(x='yrbuilt', y='livearea')) + \
    geom_point(alpha=0.3) + ggtitle('Live Area vs. Year Built')

plot1.show()

plot2 = ggplot(dwellings_ml, aes(x='yrbuilt', y='abstrprd')) + \
    geom_boxplot() + ggtitle('Abstract Present vs. Year Built')

# plot1.show()
plot2.show()

print("based on these two graphs it is obvious that live area had a much greater effect on the data. In the second graph is shows that after the 1980's live area became a lot more important and got above 5,000. So that means that before 1980 it was ushualy bellow 1980/ The other graph about thee abstrprd shows that there was a few outliers, but it stayed consistent throughput the years.")
based on these two graphs it is obvious that live area had a much greater effect on the data. In the second graph is shows that after the 1980's live area became a lot more important and got above 5,000. So that means that before 1980 it was ushualy bellow 1980/ The other graph about thee abstrprd shows that there was a few outliers, but it stayed consistent throughput the years.

QUESTION 2

Create at least one other chart to examine a variable(s) you thought might be important but apparently was not. The chart should show its relationship to the year built. Describe what you learn from the chart about how that variable is related to year built. Explain why you think it was not (very) important in the model.

type your write-up and analysis here

Show the code
# Include and execute your code here

plot3 = ggplot(dwellings_ml, aes(x='yrbuilt', y='numbdrm')) + \
    geom_point(alpha=0.3) + ggtitle('Number of Bedrooms vs. Year Built')

plot3.show()

print("I thought that the number of bedrooms was going to be really important in finding out if a house was built before 1980. However, after graphing it, it is clear that there is no relashonship between year built and the number of bedrooms like I thought. All this graph shows is that 2-4 bedrooms is the most popular to build over the years.")
I thought that the number of bedrooms was going to be really important in finding out if a house was built before 1980. However, after graphing it, it is clear that there is no relashonship between year built and the number of bedrooms like I thought. All this graph shows is that 2-4 bedrooms is the most popular to build over the years.